S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking
نویسندگان
چکیده
Non-linear models recently receive a lot of attention as people are starting to discover the power of statistical and embedding features. However, tree-based models are seldom studied in the context of structured learning despite their recent success on various classification and ranking tasks. In this paper, we propose S-MART, a tree-based structured learning framework based on multiple additive regression trees. S-MART is especially suitable for handling tasks with dense features, and can be used to learn many different structures under various loss functions. We apply S-MART to the task of tweet entity linking — a core component of tweet information extraction, which aims to identify and link name mentions to entities in a knowledge base. A novel inference algorithm is proposed to handle the special structure of the task. The experimental results show that S-MART significantly outperforms state-of-the-art tweet entity linking systems.
منابع مشابه
The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملSparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains
In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...
متن کاملThree Statistical Summarizers at CLEF-INEX 2013 Tweet Contextualization Track
According to the organizers, the objective of the 2014 CLEFINEX Tweet Contextualization Task is: “...The Tweet Contextualization aims at providing automatically information a summary that explains the tweet. This requires combining multiple types of processing from information retrieval to multi-document summarization including entity linking.” We present three statistical summarizer systems ap...
متن کاملAn Improved Particle Swarm Optimizer Based on a Novel Class of Fast and Efficient Learning Factors Strategies
The particle swarm optimizer (PSO) is a population-based metaheuristic optimization method that can be applied to a wide range of problems but it has the drawbacks like it easily falls into local optima and suffers from slow convergence in the later stages. In order to solve these problems, improved PSO (IPSO) variants, have been proposed. To bring about a balance between the exploration and ex...
متن کامل